-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14036] Read Configuration for Pub/Sub SchemaTransform #17730
[BEAM-14036] Read Configuration for Pub/Sub SchemaTransform #17730
Conversation
Can one of the admins verify this patch? |
2 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Run Java PreCommit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
.../src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaTransformReadConfiguration.java
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaTransformReadConfiguration.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaTransformReadConfiguration.java
Show resolved
Hide resolved
* letter queue topic string. | ||
*/ | ||
@Nullable | ||
public abstract String getDeadLetterQueue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pls change to getDeadLetterTopic to be consistent with the Pub/Sub read transform.
Line 865 in d9436c4
public Read<T> withDeadLetterTopic(String deadLetterTopic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chamikaramj (cc: @angoenka ) Thank you again for reviewing. May we consider leaving the name getDeadLetterQueue
?
In a previous use of AutoValueSchema with AutoValue in a different project, I observed it needed the getters to be named as get in order for the serialization to work. For example, I had a property called fooName. When I named the getter fooName(), the return value of fooName() was null when invoked in the context of a DoFn. However, when I changed the getter to getFooName() the return value was what I expected. I am not sure if my observation is still valid.
Adding to supporting get instead of with, the design goals of the configuration class are to hold data needed by its corresponding SchemaProvider. The method is not doing any action implied by the with preposition. The getter is simply getting data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think the confusion here is that this property does not correspond to that PubSubIO.getDeadLetterTopic I referenced by maps to the dlq property below.
Line 81 in 408664b
public abstract boolean useDlq(); |
Is that correct ?
.../src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaTransformReadConfiguration.java
Show resolved
Hide resolved
* {@link org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers}. | ||
*/ | ||
@Nullable | ||
public abstract String getFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see format, protoClass, thriftClass attributes in the original Read config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chamikaramj (cc: @angoenka )
I saw https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L128. Additionally, the original Provider's https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/pubsub/PubsubSchemaIOProvider.java#L209 calls the config::serializer method. In said serializer method, the format, protoClass, and thriftClass attributes are referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.
@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.
@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?
The plan and replicate like-for-like SchemaIO questions are critical and blocking design decisions that relates to this thread as well. I will hold off on any changes to this PR until we get the feedback needed. Thank you again.
Codecov Report
@@ Coverage Diff @@
## master #17730 +/- ##
==========================================
+ Coverage 74.00% 74.01% +0.01%
==========================================
Files 696 698 +2
Lines 91851 92224 +373
==========================================
+ Hits 67975 68263 +288
- Misses 22627 22710 +83
- Partials 1249 1251 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
* letter queue topic string. | ||
*/ | ||
@Nullable | ||
public abstract String getDeadLetterQueue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I think the confusion here is that this property does not correspond to that PubSubIO.getDeadLetterTopic I referenced by maps to the dlq property below.
Line 81 in 408664b
public abstract boolean useDlq(); |
Is that correct ?
* {@link org.apache.beam.sdk.schemas.io.payloads.PayloadSerializers}. | ||
*/ | ||
@Nullable | ||
public abstract String getFormat(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These again map to PubSubSchemaIO not the original PubSubIO. Is that plan here to make Pub/Sub SchemaTransform just consistent with existing Pub/Sub SchemaIO ? I'm not sure.
@TheNeuralBit @dpcollins-google WDYT ? Should we just be consistent with Pub/Sub SchemaIO implementation here or should SchemaTransform be a different design ?
@chamikaramj , @TheNeuralBit , @dpcollins-google , @pabloem If I don't see any activity on this PR in the next couple days, I will go ahead and close it to keep the PR space clean in this project. |
I think we can just be compatible with existing SchemaIO config for now and update in the future if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
…7730) * Read Configuration for Pub/Sub SchemaTransform * Add idAttribute to Read Configuration * Add Experimental annotation/remove SuppressWarning
This PR is the first of four planned PRs. It contributes to BEAM-14036 by implementing a configuration for reading from Pub/Sub. There are no dependencies on this PR.
I would like to request the following to review this PR.
R: @angoenka
R: @chamikaramj
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.UpdateCHANGES.md
with noteworthy changes.If this contribution is large, please file an Apache Individual Contributor License Agreement.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.